Bidirectional Bengali Script and Meetei Mayek Transliteration of Web Based Manipuri News Corpus

نویسنده

  • Thoudam Doren Singh
چکیده

The transliteration has attracted interest of several sections of researchers. Several techniques of transliteration have been developed and used – both statistical based approaches and rule based approaches. In the present method, a simple but effective rule based technique is developed for the transliteration between Bengali script and Meetei Mayek script of written Manipuri text. Typically, transliteration is carried out between two different languages –one as a source and the other as a target. But, for the languages which use more than one script, it becomes essential to introduce transliteration between the scripts. This is the reason why the present task is carried out between Bengali script and Meetei Mayek for Manipuri language. The proposed rule based approach points out the importance of deeper linguistic rule integration in the process by making use of the monosyllabic characteristics of Manipuri language. The Bengali script to Meetei Mayek transliteration system based on the proposed model gives higher precision and recall compared to the statistical model. But, in contrast to that, the statistical based approach gives higher precision and recall compared to the rule based approach for the reverse transliteration.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Taste of Two Different Flavours: Which Manipuri Script works better for English-Manipuri Language pair SMT Systems?

The statistical machine translation (SMT) system heavily depends on the sentence aligned parallel corpus and the target language model. This paper points out some of the core issues on switching a language script and its repercussion in the phrase based statistical machine translation system development. The present task reports on the outcome of EnglishManipuri language pair phrase based SMT t...

متن کامل

Automatic Segmentation of Manipuri (Meiteilon) Word into Syllabic Units

The work of automatic segmentation of a Manipuri language (or Meiteilon) word into syllabic units is demonstrated in this paper. This language is a scheduled Indian language of TibetoBurman origin, which is also a very highly agglutinative language. This language usages two script: a Bengali script and Meitei Mayek (Script). The present work is based on the second script. An algorithm is design...

متن کامل

Manipuri Morpheme Identification

The Morphemes of the Manipuri word are the real bottleneck for any of the Manipuri Natural Language Processing (NLP) works. It is one of the Indian Scheduled Language with less advancement so far in terms of NLP applications. This is because the nature of the language is highly agglutinative. Segmentation of a word and identifying the morphemes becomes necessary before proceeding for any of the...

متن کامل

Web Based Manipuri Corpus for Multiword NER and Reduplicated MWEs Identification using SVM

A web based Manipuri corpus is developed for identification of reduplicated multiword expression (MWE) and multiword named entity recognition (NER). Manipuri is one of the rarely investigated language and its resources for natural language processing are not available in the required measure. The web content of Manipuri is also very poor. News corpus from a popular Manipuri news website is coll...

متن کامل

Recognition of Handwritten Numerals of Manipuri Script

In this paper a support vector machine based handwritten numerals recognition system of Manipuri Script (Meetei Mayek) is investigated. We have used various feature extraction technique such as background directional distribution (BDD), zone based diagonal, projection histograms and Histogram Oriented Gradient features. In Background Directional Distribution (BDD) features background distributi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012